Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

نویسندگان

چکیده

Reinforcement Learning (RL) is a machine learning paradigm wherein an artificial agent interacts with environment the purpose of behaviour that maximizes expected cumulative reward it receives from environment. Reward machines (RMs) provide structured, automata-based representation function enables RL to decompose problem into structured subproblems can be efficiently learned via off-policy learning. Here we show RMs experience, instead being specified by user, and resulting decomposition used effectively solve partially observable problems. We pose task as discrete optimization where objective find RM decomposes set such combination their optimal memoryless policies policy for original problem. effectiveness this approach on three domains, significantly outperforms A3C, PPO, ACER, discuss its advantages, limitations, broader potential.1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inverse Reinforcement Learning in Partially Observable Environments

Inverse reinforcement learning (IRL) is the problem of recovering the underlying reward function from the behaviour of an expert. Most of the existing algorithms for IRL assume that the expert’s environment is modeled as a Markov decision process (MDP), although they should be able to handle partially observable settings in order to widen the applicability to more realistic scenarios. In this p...

متن کامل

Free-energy-based reinforcement learning in a partially observable environment

Free-energy-based reinforcement learning (FERL) can handle Markov decision processes (MDPs) with high-dimensional state spaces by approximating the state-action value function with the negative equilibrium free energy of a restricted Boltzmann machine (RBM). In this study, we extend the FERL framework to handle partially observable MDPs (POMDPs) by incorporating a recurrent neural network that ...

متن کامل

Hierarchical Reinforcement Learning for a Robotic Partially Observable Task

Most real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately, we illustrate on a complex robotic task that addressing both problems simultaneously is simpler and more efficient. We decompose our complex partially observable task into a set of sub-tasks,...

متن کامل

Nonparametric Bayesian Approaches for Reinforcement Learning in Partially Observable Domains

The objective of my doctoral research is bring together two fields: partially-observable reinforcement learning (PORL) and non-parametric Bayesian statistics (NPB) to address issues of statistical modeling and decisionmaking in complex, realworld domains.

متن کامل

Market-Based Reinforcement Learning in Partially Observable Worlds

Unlike traditional reinforcement learning (RL), market-based RL is in principle applicable to worlds described by partially observable Markov Decision Processes (POMDPs), where an agent needs to learn short-term memories of relevant previous events in order to execute optimal actions. Most previous work, however, has focused on reactive settings (MDPs) instead of POMDPs. Here we reimplement a r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Artificial Intelligence

سال: 2023

ISSN: ['2633-1403']

DOI: https://doi.org/10.1016/j.artint.2023.103989